Parallelizing dense and banded linear algebra libraries using SMPSs
نویسندگان
چکیده
The promise of future many-core processors, with hundreds of threads running concurrently, has lead the developers of linear algebra libraries to rethink their design in order to extract more parallelism, further exploit data locality, attain a better load balance, and pay careful attention to the critical path of computation. In this paper we describe how existing serial libraries like (C-)LAPACK and FLAME can be easily parallelized using the SMPSs tools, consisting of a few OpenMP-like pragmas and a run-time system. In the LAPACK case, this usually requires the development of blocked algorithms for simple BLAS-level operations, which expose concurrency at a finer grain. For better performance, our experimental results indicate that column-major order, as employed by this library, needs to be abandoned in benefit of a block data layout. This will require a deeper rewrite of LAPACK or, alternatively, a dynamic conversion of the storage pattern at run-time. The parallelization of FLAME routines using SMPSs is quite simple as this library includes blocked algorithms (or algorithms-by-blocks in the FLAME argot) for most operations and storage-by-blocks (or block data layout) is already in place.
منابع مشابه
Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs
In this paper, we investigate how to exploit task-parallelism during the execution of the Cholesky factorization on clusters of multicore processors with the SMPSs programming model. Our analysis reveals that the major difficulties in adapting the code for this operation in ScaLAPACK to SMPSs lie in algorithmic restrictions and the semantics of the SMPSs programming model, but also that they bo...
متن کاملCase studies on the development of ScaLAPACK and the NAG Numerical PVM Library
In this paper we look at the development of ScaLAPACK, a software library for dense and banded numerical linear algebra, and the NAG Numerical PVM Library, which includes software for dense and sparse linear algebra, quadrature, optimization and random number generation. Both libraries are aimed at distributed memory machines, including networks of workstations. The paper concentrates on the un...
متن کاملApplication Interface to Parallel Dense Matrix Libraries: Just let me solve my problem!
We focus on how applications that lead to large dense linear systems naturally build matrices. This allows us explain why traditional interfaces to dense linear algebra libraries for distributed memory architectures, which evolved from sequential linear algebra libraries, inherently do not support applications well. We review the application interface that has been supported by the Parallel Lin...
متن کاملCrpc Research Into Linear Algebra Software for High Performance Computers
In this paper we look at a number of approaches being investigated in the Center for Research on Parallel Computation (CRPC) to develop linear algebra software for high-performance computers. These approaches are exempliied by the LAPACK, templates, and ARPACK projects. LAPACK is a software library for performing dense and banded linear algebra computations, and was designed to run eeciently on...
متن کاملGeneric Programming for High Performance Numerical Linear Algebra
We present a generic programming methodology for expressing data structures and algorithms for high-performance numerical linear algebra. As with the Standard Template Library [14], our approach explicitly separates algorithms from data structures, allowing a single set of numerical routines to operate with a wide variety of matrix types, including sparse, dense, and banded. Through the use of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 21 شماره
صفحات -
تاریخ انتشار 2009